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Abstract 

We give two different Johnson-Lindenstrauss distributions, each with cohimn sparsity s = 
ld{e~^ log{l/S)) and embedding into optimal dimension k = 0(e^^ log(l/(5)) to achieve distor- 
tion lie with probabihty 1 — d. That is, only an 0(e)-fraction of entries are non-zero in 
each embedding matrix in the supports of our distributions. These are the first distributions 
to provide o(fc) sparsity for all values of £,d. Previously the best known construction obtained 
s = e(£-i log^(l/(5)i3 [Dasgupta-Kumar-Sarlos, STOC 20100. One of our distributions can be 
sampled from using 0(log(l/(5) logc?) random bits. 

Some applications that use Johnson-Lindenstrauss embeddings as a black box, such as those 
in approximate numerical linear algebra ([Sarlos, FOCS 2006], [Clarkson- Woodruff, STOC 
2009]), require exponentially small 6. Our linear dependence on log(l/(5) in the sparsity is 
thus crucial in these applications to obtain speedup. 

1 Introduction 

The randomized Johnson-Lindenstrauss lemma states: 

Lemma 1 (JL Lemma [IB]). For any integer d > 0, and any < e,6 < 1/2, there exists a 
probability distribution on k x d real matrices for k = 0(e~^ log(l/5)) such that for any x G M"^ 
with \\x\\2 = 1, Prs[|||5x||| -1\> e]<5. 

Proofs of the JL lemma can be found in [H El El EH HSl Hg ISl [H [23] . The value of k in the 
JL lemma is known to be optimal [T7] (also see a later proof in |19]). 

The JL lemma is a key ingredient in the JL flattening theorem, which states that any n points 
in Euclidean space can be embedded into 0(e~^logn) dimensions so that all pairwise Euclidean 
distances are preserved up to lie. The JL lemma is a useful tool for speeding up solutions to several 
high-dimensional problems: closest pair, nearest neighbor, diameter, minimum spanning tree, etc. 
It also speeds up some clustering and string processing algorithms, and can further be used to 
reduce the amount of storage required to store a dataset, e.g. in streaming algorithms. Recently 
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it has also found applications in approximate numerical algebra problems such as linear regression 
and low-rank approximation [9l[26]. See |15t I28j for discussions of these and other applications. 

Standard proofs of the JL lemma take a distribution over dense matrices (e.g. i.i.d. Gaussian or 
Bernoulli entries), and thus performing the embedding naively takes 0{k ■ ||x||o) time where x has 
||x||o non-zero entries. Several works have devised other distributions which give faster embedding 
times [2| [3l l4l [14 } 122 ^ [30] . but all these methods require il.{d) embedding time even for sparse vectors 
(even when ||x||o = !)• This feature is particularly unfortunate in streaming applications, where a 
vector X receives coordinate-wise updates of the form x x + v ■ Ci in a. data stream, so that to 
maintain some linear embedding Sx of x we should repeatedly calculate Sci during updates. Since 
||ej||o = 1, even the naive 0{k • ||ej||o) embedding time method is faster than these approaches. 

Even aside from streaming applications, several practical situations give rise to vectors with 
||a;||o <^ d. For example, a common similarity measure for comparing text documents in data 
mining and information retrieval is cosine similarity [25], which is approximately preserved under 
any JL embedding. Here, a document is represented as a bag of words with the dimensionality 
d being the size of the lexicon, and we usually would not expect any single document to contain 
anywhere near d distinct words (i.e., we expect sparse vectors). In networking applications, if Xjj 
counts bytes sent from source i to destination j in some time interval, then d is the total number 
of IP pairs, whereas we would not expect most pairs of IPs to communicate with each other. 

One way to speed up embedding time in the JL lemma for sparse vectors is to devise a distribu- 
tion over sparse embedding matrices. This was first investigated in jT], which gave a JL distribution 
where only one third of the entries of each matrix in its support was non-zero, without increas- 
ing the number of rows k from dense constructions. Later, the works [U [27] gave a distribution 
over matrices with only 0(log(l/(5)) non-zero entries per column, but the algorithm for estimating 
II 3; lb given the linear sketch then relied on a median calculation, and thus these schemes did not 
provide an embedding into £2- In several applications, such as nearest-neighbor search [16] and 
approximate numerical linear algebra [9l [26] , an embedding into a normed space or even (.2 itself 
is required, and thus median estimators cannot be used. Recently Dasgupta, Kumar, and Sarlos 
[lOj . building upon work in [31], gave a JL distribution over matrices where each column has at 
most s = 0(e~^ log^(l/5)) non-zero entries, thus speeding up the embedding time to 0{s ■ ||x||o). 
This "DKS construction" requires 0{ds log k) bits of random seed to sample a matrix from their 
distribution. The work of [10] left open two main directions: (1) understand the sparsity parameter 
s that can be achieved in a JL distribution, and (2) devise a sparse JL transform distribution which 
requires few random bits to sample from, for streaming applications where storing a long random 
seed requires prohibitively large memory. 

The previous work [19] of the current authors made progress on both these questions by showing 
0(e~^ log^(l/5)) sparsity was achievable by giving an alternative analysis of the scheme of jlO] 
which also only required 0(log(l/(e5)) log d) seed length. The work of [6] later gave a tighter 
analysis under the assumption e < l/log^(l/5), improving the sparsity and seed length further by 
log(l/e) and loglog(l/(5) factors in this case. In Section [A. II we show that the DKS scheme requires 
s = f2(e~^ log^(l/(5)), and thus a departure from their construction is required to obtain better 
sparsity. For a discussion of other previous work concerning the JL lemma see jl9j . 

Main Contribution: In this work, we give two new constructions which achieve sparsity s = 
0(e~^ log(l/(5)) for £2 embedding into optimal dimension k = 0(e~^ log(l/(5)). This is the first 
sparsity bound which is always asymptotically smaller than k, regardless of how e and 6 are related. 
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Figure 1: In all three constructions above, a vector in M"^ is projected down to M.^. Figure (a) is 
the DKS construction in [10], and the two constructions we give in this work are represented in (b) 
and (c). The out-degree in each case is s, the sparsity. 



One of our distributions requires O (log (1/(5) log d) uniform random bits to sample a random matrix. 

We also describe variations on our constructions which achieve sparsity 0(e~^ log(l/5)), but 
which have much simpler analyses. We describe our simpler constructions in Section [3l and our 
better constructions in Section [H We also show in Section |A] that our analyses are tight up to a 
constant factor, so any further improvement in sparsity would require a different construction. 

In Section [5] we discuss how to use our new schemes to speed up the numerical linear algebra 
algorithms in [9] for approximate linear regression and best rank-A: approximation in the streaming 
model of computation. In Section [5] we show that any JL distribution automatically provides 
approximate matrix sketches as defined in [26]. While [26] also showed this, it lost a logarithmic 
factor in the target dimension due to a union bound in its reduction; the work of [9] avoided this 
loss, but only for the JL distribution of random Bernoulli matrices. We show a simple general 
reduction for any JL distribution which incurs no loss in parameters. Using this fact, plugging in 
our sparse JL transform then yields faster linear algebra algorithms using the same space. 

1.1 Our Approach 

Our constructions are depicted in Figure [TJ Figure [^a) represents the DKS construction of [1^ 
in which each item is hashed to s random target coordinates with replacement. Our two schemes 
achieving s = @{e~^ log(l/5)) are as follows. Construction (b) is much like (a) except that we hash 
coordinates s times without replacement. In (c), the target vector is divided up into s contiguous 
blocks each of equal size k/s, and a given coordinate in the original vector is hashed to a random 
location in each block (essentially this is the CountSketch of [8], though we use higher indepen- 
dence in our hash functions). In all cases (a), (b), and (c), we randomly flip the sign of a coordinate 
in the original vector and divide by ^/s before adding it in any location in the target vector. 

We give two different analyses for both our constructions (b) and (c). Look at the random 
variable Z = \\Sx\\2 — \\x\\2, where S is a random matrix in the JL distribution. Our proofs all use 
Markov's bound on the ith moment to give Pr[|Z| > e||x||2] < e~^-E[Z^] for £ = log(l/5) an even 
integer. The task is then to bound E[Z^]. In our first approach, we observe that Z is a quadratic 
form in the random signs, and thus its moments can be bounded via the Hanson- Wright inequality 
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|13j . This analysis turns out to reveal that the hashing to coordinates in the target vector need not 
be done randomly, but can in fact be specified by any sufficiently good code. Specifically, in (b) it 
suffices for the columns of the embedding matrix (ignoring the random signs and division by ^/s) 
to be codewords in a constant- weight binary code of weight s and minimum distance s — 0{s^ /k). 
In (c), if for each i G [d\ we let Cj be a length-s vector with entries in [k/ s] specifying where 
coordinate i is mapped to in each block, it suffices for {Cj}f^^ to be a code of minimum distance 
s — 0{s^ /k). It is fairly easy to see that if one wants a deterministic hash function, it is necessary 
for the columns of the embedding matrix to be specified by a code: if two coordinates have small 
Hamming distance in their vectors of hash locations, it means they collide often. Since collision 
is the source of error, an adversary in this case could ask to embed a vector which has its mass 
equally spread on the two coordinates whose hash locations have small Hamming distance, causing 
large error with large probability over the choice of random signs. What our analysis shows is that 
not only is a good code necessary, but it is also sufficient. 

In our second analysis approach, we expand to obtain a polynomial with roughly d?^ terms. 
We view its monomials as being in correspondence with graphs, group monomials whose graphs 
are isomorphic, then do some combinatorics to make the expectation calculation feasible. In this 
approach, we assume that the random signs as well as the hashing to coordinates in the target 
vector are done 2 log (1/5) -wise independently. This graph-based approach played a large role 
in the analysis in our previous work [19] (which this work subsumes), and was later also used 
in [6]. We point out here that Figure [I{c) is somewhat simpler to implement, since there are 
simple constructions of 21og(l/(5)-wise hash families [7j. Figure [H^b) on the other hand requires 
hashing without replacement, which amounts to using random permutations. We thus derandomize 
Figure [Dj^b) using almost 2 log(l/(5)-wise independent permutation families |21j . 

2 Conventions and Notation 

Definition 2. For A G M"^", we define the Frobenius norm of A as \\A\\f = \jYli,j j- 

Definition 3. For A G M"^", we define the operator norm of A as \\A\\2 = sup|jj,||2=i ||^x||2. In 
the case A is symmetric, this is also the largest magnitude of an eigenvalue of A. 

Henceforth, all logarithms are base-2 unless explicitly stated otherwise. For a positive integer 
n we use [n] to denote the set {1, . . . ,n}. 5*^"^ denotes {y G M'^ : ||y||2 = 1}. We will always be 
focused on embedding a vector x G M*^ into M^, and we assume ||a;||2 = 1 without loss of generality 
(since our embeddings are linear). All vectors v are assumed to be column vectors, and v"^ denotes 
its transpose. We often implicitly assume that various quantities are powers of 2 or 4, which is 
without loss of generality. Space complexity bounds (as in Section [5]), are always measured in bits. 

Definition 4. The Hamming distance A{u,v) of two vectors u,v is \{i : Ui ^ Vi}\. An {n,k,d)q 
code is a set of vectors in [g]" with all pairwise Hamming distances at least d. 

3 Code-Based Constructions 

In this section, we provide analyses of our constructions (b) and (c) in Figure [1] when the hash 
locations are determined by some fixed error-correcting code. We give the full argument for (c) 
below, then discuss in Remark I 111 how essentially the same argument can be applied to analyze (b). 
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Define k = C ■ e~^log(l/(5) for a sufficiently large constant C. Let s be some integer dividing 
k satisfying s > 2e~^ log(l/5). Let C = {Ci, . . . , Cd} be any (s, log^/^ d,s — 0(s^//c))fc/s code. We 
specify our JL family by describing the embedded vector y. Define hash functions a : [d] x [s] — ?■ 
{— 1, 1} and h : [d]x [s] — )• [k/s]. The former is drawn at random from a 2 log (1/(5)- wise independent 
family, and the latter has h{i,j) being the jth entry of the ith codeword in C. We conceptually 
view y € as being in Our embedded vector then has y^j = r)=j "^(^j '^)^«/v^- 

This describes our JL family, which is indexed by a. Note the sparsity is s. 

Remark 5. It is important to know whether an (s, logf.^^ d, s — 0{s'^ /k))f,/g code exists. By picking 
h at random from an 0(log(d/(5))-wise independent family and setting s > Q{e~^ ■\/log{d/6) log(l/(5)), 
it is not too hard to show via the Chernoff bound (or more accurately, Markov's bound applied 
with the 0(log((i/(5))th moment bound implied by integrating the Chernoff bound) followed by a 
union bound over all pairs of (2) vectors that h defines a good code with probability 1 — 6. We 
do not perform this analysis here since Section 14.11 obtains better parameters. We also point out 
that we may assume without loss of generality that d = 0{e~'^ /6). This is because there exists 
an embedding into this dimension with sparsity 1 using only 4-wise independence with distortion 
(1 + e) and success probability 1 — (5 [U [27]. It is worth noting that in the construction in this 
section, potentially h could be deterministic given an explicit code with our desired parameters. 



Analysis of Figure [T](c) code-based construction: We first note 

1 

WvWl = + - '^'^Vi,j,rXiXja{i,r)a{j,r), 

where rji,j,r is 1 if h{i,r) = h{j,r), and rjij^r = otherwise. We thus would like that 

1 " 

Z = - ^ ^ r]ij^rXiXj(j{i, r)(j{j, r) (1) 

ijtj r=l 

is concentrated about 0. Note Z is a quadratic form in a which can be written as a^'^Ta for an 
sd X sd block-diagonal matrix T. There are s blocks, each dx d, where in the rth block we have 
{Tr)ij = XiXjrjij^r/s for i 7^ j and {Tr)i^i = for all i. Now, Pr[|Z| > e] = Pr[|(T"'"T(7| > e]. To 
bound this probability, we use the Hanson- Wright inequality combined with a Markov bound. 

Theorem 6 (Hanson- Wright inequality [13). Let z = (zi, . . . , z.„) be a vector of i.i.d. Bernoulli 
±1 random variables. For any symmetric B G M"-^" and i >2, 

E \z'^Bz-trace{B)f < C'^ ■ max ^VI ■ \\B\\f J ■ WB^Y 

for some universal constant C > 0. 

We prove our construction satisfies the JL lemma by applying Theorem [6] with z = a, B = T. 
Lemma 7. ||r|||, = 0{l/k). 
Proof. 



I^IIf 



• E^'^f • ( = ^ • E^'^' • - ^(C.^C,)) < 0{l/k) . \\x\\l = 0{l/k). 

i^j \r=l / i^j 
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Lemma 8. ||r||2 < 1/s. 

Proof. Since T is block-diagonal, its eigenvalues are the eigenvalues of each block. For a block T^, 
write Tr = (1/s) • (5^- — D). D is diagonal with Di^i = x?, and {Sr)ij = XiXjr]ij^r, including when 
i = j. Since Sr and D are both positive semidefinite, we have ||T'||2 < • max{||S'r||2, ||-D||2}- 

We have ||-D||2 = < 1- For Sr, define ut for t £ [k/s] by {ut)i = Xi if h{i,r) = t, and {ut)i = 

otherwise. Then ui, . . . ,Uj^/g are eigenvectors of Sr each with eigenvalue UtiiHl; and furthermore 
they span the image of Sr- Thus ||5'r||2 = max^ \\ut\\2 < ll^^lli — ^- ' 

Theorem 9. Pro-[|||y||| - 1| > e] < 5. 

Proof. By a Markov bound applied to for £ an even integer, 

Pr,[|Z| >e] <e-^-E,[Z^]. 
Since Z = a'^Ta and trace(T) = 0, applying Theorem [6] with B = T, z = a, and i < log(l/5) gives 




Pr„[\Z\>e]<C^-max{0{e^^)-J-,e-^-} . (2) 



since the ith moment is determined by 2 log (1/(5) -wise independence of a. We conclude the proof 
by noting that the expression in Eq. ([2]) is at most 6 for £ = log(l/(5) and our choices for s, k. ■ 

Remark 10. Only using that C has sufficiently high minimum distance, it is impossible to improve 
our analysis further. For example, for any (s, log^/^ d, s — 0{s'^ /k))k/g code C, create a new code 
C' which simply replaces the first letter of each codeword with "1"; C' then still has roughly the 
same minimum distance. However, in our construction this corresponds to all indices colliding in 
the first chunk of k/s coordinates, which creates an error term of (1/s) • '^^-^j XiXj(T{i,r)a{j,r). 
Now, suppose X consists of t = (1/2) •log(l/(5) entries each with value l/\/t. Then, with probability 
^ 6, all these entries receive the same sign under a and contribute a total error of Q{t/s) in 
the first chunk alone. We thus need t/s = 0{e), which implies s = Q{e~^ log(l/(5)). 

Remark 11. It is also possible to use a code to specify the hash locations in Figure [I{b). In 
particular, let the jth entry of the ith. column of the embedding matrix be the jth. symbol of the 
ith codeword (which we call h{i,j)) in a weight-s binary code of minimum distance s — 0{s^ /k) for 
s > 2e~^ log(l/(5). Define rn^^r for z,j G [d],r G [s] as an indicator variable for h{i,r) = h{j,r) = 1. 
Then, the error is again exactly as in Eq. ([1]). The Frobenius norm proof is identical, and the 
operator norm proof is nearly identical except that we have k blocks in our block-diagonal matrix 
instead of s. Also, as in Remark llOl such a code can be shown to exist via the probabilistic method 
(the Chernoff bound can be applied using negative dependence, followed by a union bound) as long 
as s = Q,{e~^ y^log{d/6) log(l/(5)). We omit the details since Section HT2] obtains better parameters. 



4 Random Hashing Constructions 

In this section, we show that if the hash functions h described in Section [3] and Remark [11] are 
not specified by fixed codes, but rather are chosen at random from some family of sufficiently high 
independence, then one can achieve sparsity 0{e~^ log(l/5)) (in the case of Figure[Hb), we actually 
need almost k-wise independent permutations). Recall our bottleneck in reducing the sparsity in 
Section [3] was actually obtaining the codes, discussed in Remark [5] and Remark II li 
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4.1 Block Construction 



Here we analyze the construction of Figure [T]^c), except rather than let C be an arbitrary code, 
we let the underlying hash function h : [d] x [s] — )• [k/s] be randomly selected from a 21og(l/5)- 
wise independent family. Note that one can sample a random matrix from this family using a 
O (log (1/(5) log (i)-length seed. 

We perform our analysis by bounding the £th moment of Z from first principles for i = log (1/5) 
an even integer (for this particular scheme, it seems the Hanson- Wright inequality does not simplify 
any details of the proof). We then use Markov's inequality to say Prft,^o-[|Z| > e] < ■ 'Eih,a[Z^]- 

Let Zr = '}2i:^j'ni,j,rXiXj'^{h^)'^{j,f) SO that Z = (1/s) • '^r=i^r- We first bound the tth 
moment of each Zr for 1 < t < £. As in the Frobenius norm moment bound of [19], and also used 
later in [6], the main idea is to observe that monomials appearing in the expansion of Z* can be 
thought of in correspondence with graphs. Notice 



n 'niu,ju,rXi^Xj^a{iu, r)a{ju,r) 

il¥'jl,--;itj^jt u=l 



(3) 



Each monomial corresponds to a directed multigraph with labeled edges whose vertices correspond 
to the distinct iu and ju- An Xi^Xj^ term corresponds to a directed edge with label u from the 
vertex corresponding to iu to the vertex corresponding to ju- The main idea to bound E/i^CTl'^r] is 
then to group monomials whose corresponding graphs are isomorphic, then do some combinatorics. 



Lemma 12. For t < log{l/6), E/,,^[Z*] < 20(<) • 



s/k 

(t/log(fc/s))* 



t < log{k/s) 
otherwise 



Proof. Let Qt be the set of isomorphism classes of directed multigraphs with t labeled edges with 
distinct labels in [t], where each edge has positive and even degree (the sum of in- and out-degrees), 
and the number of vertices is between 2 and t. Let Q'^ be similar, but with labeled vertices and 
connected components as well, where vertices have distinct labels between 1 and the number of 
vertices, and components have distinct labels between 1 and the number of components. Let / 
map the monomials appearing in Eq. ([3|) to the corresponding graph isomorphism class. By 2i-wise 
independence of a, any monomial in Eq. ([3]) whose corresponding graph does not have all even 
degrees has expectation 0. For a graph G, we let v denote the number of vertices, and m the 
number of connected components. Let du denote the degree of a vertex u. Then, 



il^jl,--;it¥=jt \u=l / 



cr{iu,r)a{ju,r) 



lu=l 



GeGt ii^ji,...,it¥'jt \u=l 



E E 



f{iiudu)i^i)=G 



g \ v—m 
k^ 



n 

\u=l 



< 



E T 



' ) 



Vdi/2,...,d„/2. 



E 



.u=l 



n 

,«=i 



(4) 
(5) 
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= Y.{iy^-^--( 1 y (6) 

Geg', U/2,...,d„/2j 

We now justify these inequalities. The justification of Eq. ^ is similar to that in the Frobenius 
norm bound in [19]. That is, Ylu=i''liu,ju,r is determined by h{iu,r),h{ju,r) for each u S [t], and 
hence its expectation is determined by 2t-wise independence of h. This product is 1 if iu and ju hash 
to the same element for each u and is otherwise. Every iu,ju pair hashes to the same element if 
and only if for each connected component of G, all elements of {ii, . . . ,it,ji, . . . ,jt} corresponding 
to vertices in that component hash to the same value. We can choose one element of [k/s] for 
each component to be hashed to, thus giving [k/ s)^ possibilities. The probability of any particular 
hashing is {k/s)~'", and this gives that the expectation of the product is {s/k)'"~"^. 

For Eq. ([5]), note that = 1, and the coefficient of Y[u=i ^a" ™ its expansion for Ylu du = t 

is * d 72) • Meanwhile, the coefficient of this monomial when summing over all ii ji, . . . ,it 7^ 
jt for a particular G E is at most v\. For Eq. ([6|), we move from isomorphism classes in Qt to 
those in Q'^. For any G € Qt, there are v\ ■ ml ways to label vertices and connected components. 

We now bound the sum of the 1/ {^^^2 d 72) t^^™- Fix -wi, . . . , Vm, ti, . . . ,tm (where there are 
Vi vertices and tj edges in the ith. component Cj), and the assignment of vertex and edge labels to 
connected components. We upper bound Eq. ^ by considering building G edge by edge, starting 
with edges. Let the initial graph be Gq, and we form G = Gj by adding edges in increasing label 
order. We then want to bound the sum of 1/(^^/2 * d 72) '-'^^^ ^ ^ ^£ which satisfy the quantities 

we have fixed. Note 1/(^^/2 * d 72) equals 2'^^*^ • • 13^=1 ' ^V^"^"^- Initially, when t = 0, our 
sum is 5o = 1. When considering all ways to add the next edge to Gu to form an edge 

i ^ j contributes Su • -s/didj/t to Su+i- Since we fixed assignments of edge labels to connected 
components, this edge must come from some particular component Cw Summing over vertices 
i ^ j m Cyj and applying Cauchy-Schwarz, 

2 



1 



Since there are (^^ (^^ * ways to assign edge and vertex labels to components, Eq. ([6]) gives 



v=2 m=l 



v=2 m=l 

t v/2 

v=2 g=l 



Eq. d?]) holds since there are at most 2^'+* ways to choose the Vi,ti and U > Vi. Eq. ([8]) follows since 
V > 2m and thus v = 0{v — m). Setting q = v — m and under the constraint g > 1, {s/k^ ■ 9* is 
maximized when q = max{l, 0(t/log(A;/s))}. The lemma follows. ■ 
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Theorem 13. Our construction in this section gives a JL family with sparsity s = 0{e ^■log{l/6)). 
Proof. We have 



1 



1 

^7 



E 



ri<...<rg 
Vi £i>l 



1/2 



1 / 

9=1 



e/2 



^°"'^E(:)-'-o' 



.2 \ <? 



-) 



S\9 



n 

i=l 



log{k/s) 



(9) 



Eq. ([9]) fohows since there are (^) ways to choose the r^, and there are at most 2^ ^ ways 
to choose the ii. Furthermore, even for the ii > log(fc/s), we have 20(^>) • (£i/log(A;/s))^^ = 
20(4) . (^s/kY ■ {ii / log{k / s)Y\ so that the {s/kY term is vahd. Taking derivatives shows that the 
above is maximized for q = s^/{ek) < i/2, which gives a summand of 2^^^^ ■ i^. Thus, we have 
that the above moment is at most (e/2)^ when k = C'i/e^ for sufficiently large C . The claim then 
follows since Pr/i^o-[|-Z^| > e] < • Eh,a[Z^] by Markov's inequality, and we set i = log(l/(5). ■ 



4.2 Graph Construction 

In this section, we analyze the construction in Figure [^b) when the hashing is done randomly. Our 
analysis for this construction is quite similar to the analysis in Section [4.11 

The distribution over matrices 5 is such that each column of S has exactly s = @{e~^ log(l/(5)) 
of its entries, chosen at random, each randomly set to ±l/-y/s. All other entries in S are 0. That 
is, we pick a random bipartite graph with d vertices in the left vertex set and k in the right, where 
every vertex on the left has degree s. The matrix S is the incidence matrix of this graph, divided 
by ^/s and with random sign flips. 

We realize the distribution over S via two hash functions h : [d] x [k] — )■ {0, 1} and cr : [d] x 
[s] —7- {—1,1}. The function a is drawn from from a 2 log(l/(5)-wise independent family. The 
function h has the property that for any i, exactly s distinct r € [k] have h{i,r) = 1; in particular, 
we pick d seeds log(l/5)-wise independently to determine hi for i = l,...,d, and where each 
hi is drawn from a 7-almost 2 log(l/5)-wise independent family of permutations on [d] for 7 = 
(es/((i^A;))®(^°^(^/'')\ The seed length required for any one such permutation is 0(log(l/5) log d + 
log(l/7)) = 0(log(l/(5) log d) [21], and thus we can pick d such seeds 2 log(l/5)-wise independently 
using total seed length 0(log^(l/5) log d). We then let h{i,r) = 1 iff some j G [s] has hi{j) = r. 

li y = Sx, then we have 

llylli = Iklli + - • X] ^i^j(^ihr)a{j,r)h{i,r)h{j,r), 
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Define 



Z =- -^2^2 ^i^j^ih r)cT{j, r)h{i, r)h{j, r) = - - ^Zr- 



We would like to show that Pr/j^g- [1-2^1 > s] < S, which we show by via the Markov bound 
Pr;j^cr[|^| > e] < ■ Eh,a[Z^] for some sufficiently large even integer £. We furthermore note 
Eft,,J[Z^] < E,[y^] where 

Y = - ■ '^'^XiXja{i,r)a{j,r)6i^rSj,r = - ' X^^r, 

r r 

where the 6i^r are independent 0/1 random variables each with mean s/k. This is because, when 
expanding into monomials, the expectation over h (after taking the expectation over a) only 
term-by-term increases by replacing the random variables h{i,r) with 6i^r- We analyze moments of 
the Yr to then obtain an upper bound on the Eo-[y^] for £ = log(l/5), which in turns provides an 
upper bound on 

We carry out our analysis assuming h is perfectly random, then describe in Remark [16] how to 
relax this assumption by using 7-almost 21og(l/(5)-wise permutations as discussed above. 



Lemma 14. For t > 1 an integer, Eo-[y/] < 2'^(*) 



t < log{k/s) 



{t/log{k/s)Y otherwise 



Proof. We have 



Yi I n ^^"^j- 

iiytji,...,itT^jt \u=l , 



•E 



a{iu,r)a{ju,'^ 



.u=l 



•E 



.u=l 



(10) 



Define Qt as the set of isomorphism classes of directed multigraphs with t edges having distinct 
labels in [t] and no self-loops, with between 2 and t vertices (inclusive), and where every vertex 
has an even and positive sum of in- and out-degrees. Let / map variable sequences to their 
corresponding graph isomorphism class. For a graph G, let v be its number of vertices, and let du 
be the sum of in- and out-degrees of vertex u. Then, 



G&Gt ilT^ju-My^jt V " / 



E 



< 



/((i„ju))Ui=G 



1 



GeGt 



< . y 

~ \k 
G&gi 



(di/2,...,(i„/2) 
1 



t 



< 



\di/2,...,d^,/2j 

^°">-E(f)"4-fEriv^ 



V \ G u=l 



k, 



(11) 
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where G'^ is the set of ah isomorphism classes of directed multigraphs as in Gt , but in which vertices 
are labeled as well, with distinct labels in [v] . The summation over G in Eq. (Ilip is over the G G G'^ 
with V vertices. We bound this summation. We start with a graph with zero edges with vertices 
labeled 1,. . . ,v then consider how our summation increases as we build the graph edge by edge. 
Initially set So = 1. We will think each Si as a sum of several terms, where each term corresponds 
to some graph obtained by a sequence of i edge additions, so that the summation in Eq. (Ilip is 
bounded by St- When we add the {i + l)st edge, we have 

Si+i/Si < j ^ y^;;- < l^y^;;] <2tv, 



with the last inequality following by Cauchy-Schwarz. It thus follows that the summation in Eq. ([TT 
is at most {2tvY, implying 



■ V 



The above is maximized for v = max {2, t/ ln(A;/s)} (recall v > 2), giving our lemma. 
Theorem 15. Pr/, <^[|Z| > e] <6. 



Proof. We have 



ri<...<rg \ ^' ' i=l 

1 2^W -f^ Q • ^''^^'^ 



^'-ijj (12) 



The above expression is then identical to that in the proof of Theorem [TBI and thus it is at 
most (e/2)^. We then set set £ = log(l/(5) an even integer so that, by Markov's inequality, 

Pr,,.[|Z| >e]< e-' ■ ^hA^'] < • ^AY'] < 2"^ = 6. 



Remark 16. As mentioned in Section 14.21 we can specify h via d hash functions hi chosen 
log(l/(5)-wise independently where each hi is drawn at random from a 7-almost 2 log(l/(5)-wise 
independent family of permutations, and where the seeds used to generate the hi are drawn log(l /5)- 
wise independently. Here, 7 = {es/{d?k))^^^"^^^^^^\ In general, a 7-almost ^-wise independent 
family of permutations from [d] onto itself is a family of permutations where the image of any 
fixed i elements in [d] has statistical distance at most 7 when choosing a random j G J- when 
compared with choosing a uniformly random permutation /. Now, there are [kd'^Y monomials in 
the expansion of Z^. In each such monomial, the coefficient of the E[]^^ /i(zm, r„)/i(ju, r^)] term is 
at most s~^. In the end, we want E/^ ^-[Z^] < 0{eY to apply Markov's inequality. Thus, we want 
{kd^sY ■ 7 < 0{eY. 
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Remark 17. It is worth noting that if one wants distortion libej with probabiUty l — 5i simultane- 
ously for all i in some set S, our proofs of Theorem 1131 and Theorem 1151 reveal that it suffices to set 
s = C ■ s\XY>i^s^^^ log(l/(5j) and k = C ■ supjg^ log(l/(^j) in both our constructions Figure mb) 
and Figure [T|^c) . 



5 Faster numerical linear algebra streaming algorithms 

The works of [9l [26] gave algorithms to solve various approximate numerical linear algebra problems 
given small memory and a only one or few passes over an input matrix. They considered models 
where one only sees a row or column at a time of some matrix A G R'^^". Another update model 
considered was the turnstile streaming model. In this model, the matrix A starts off as 0. One 
then sees a sequence of m updates (zi, . . . , {im, jm,Vm), where each update {i,j,v) triggers 

the change Aij ^ Aij + v. The goal in all these models is to compute some functions of A at 
the end of seeing all rows, columns, or turnstile updates. The algorithm should use little memory 
(much less than what is required to store A explicitly) . Both works [HI [26] solved problems such 
as approximate linear regression and best rank-A; approximation by reducing to the problem of 
sketches for approximate matrix products. Before delving further, first we give a definition. 

Definition 18. Distribution V overW^^'^ has (e, (5)-JL moments if for £ = log(l/(5) and\/x G S'^~^, 



E 



\Sx\\2 



1 



Now, the following theorem is a generalization of [9l Theorem 2.1]. The theorem states that any 
distribution with JL moments also provides a sketch for approximate matrix products. A similar 
statement was made in [26\ Lemma 6], but that statement was slightly weaker in its parameters 
because it resorted to a union bound, which we avoid by using Minkowski's inequality. 

Theorem 19. Given < e,5 < 1/2, let T> be any distribution over matrices with d columns with 
the (e, 5)-JL moment property. Then for A, B any real matrices with d rows and \\A\\f = \\B\\f = 1, 

Vrsr^v [WA^S^SB - A^B\\f > 3e/2] < 6. 

Proof. Let x,y ^M.'^ each have £2 norm 1. Then 

\Sxg + \\Sy\\l-\\S{x-y)\\l 



{Sx,Sy) 



so that 



E 



\{Sx,Sy) - {x,y)\ 



1 

3' 



E 



\i\\Sx\\ 



< ■ max <! E 



< 



I) - mx - y)\\l - \\x - y\\l)f]) 



,E 



,E 



\\\Six-y)\\l-\\x-y\\lf]} 



with the middle inequality following by Minkowski's inequality. Now, if A has n columns and B has 
m columns, label the columns of A as xi, . . . ,Xn G and the columns of i? as yi, . . . , ym G 
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Define the random variable Xi j = l/(||xi||2||?/j||2) • {{Sxi,Syj) — {xi,yj)). Then \\A^S'^SB 
j^j Ikilli ■ llyjlli ■ -^Ij- Then again by Minkowski's inequality, 



E 



lA^S'^B-A^BWlY^'' 



E 



'i'j ||2 



|2 v2 
I2 • 



£/2' 



e/2 



< 



E 

Ml 



\x,\\l\\yM-n\x^,/?/' 



< 



^illi ■ II2 



= (3e/4)^ 



For £ = log(l/(5), Pr [\\A^S^B - A^B\\f > 3e/ 2] < (2e/3)-^ • E [\\A^S^B - < 5. ■ 

Remark 20. Often when one constructs a JL distribution D over k x d matrices, it is shown that 

Vx G S'^-^ Ve > 1/Vk Prs^v [\\\Sx\\l - l| > e] < e-®^"''^) 

Any such distribution automatically satisfies the (e, e~®'-^^'^^)-JL moment property for any e > 
by converting the tail bound into a moment bound via integration by parts. 

Now we arrive at the main point of this section. Several algorithms for approximate linear 
regression and best rank-fc approximation in [9] simply maintain SA as A is updated, where S 
comes from the JL distribution with r2(log(l/(5))-wise independent ibl/\/fc entries. In fact though, 
their analyses of their algorithms only use the fact that this distribution satisfies the approximate 
matrix product sketch guarantees of Theorem [T9l Due to Theorem [19] though, we know that any 
distribution satisfying the (e, 5)-JL moment condition gives an approximate matrix product sketch. 
Thus, random Bernoulli matrices may be replaced with our sparse JL distributions in this work. We 
now state some of the algorithmic results given in [U] and describe how our constructions provide 
improvements in the update time (the time to process new columns, rows, or turnstile updates). 

As in [9] , when stating our results we will ignore the space and time complexities of storing and 
evaluating the hash functions in our JL distributions. We discuss this issue later in Remark 1231 

5.1 Linear regression 

In this problem we have a A £ M'^^'^ and 6 G M'^. We would like to compute a vector x such that 
\\Ax — b\\p < (1 + e) • min^;* \\Ax* — b\\F with probability 1 — 6. In [9], it is assumed that the entries 
of A,b require 0(log(nd)) bits of precision to store precisely. Both A,b receive turnstile updates. 

Theorem 3.2 of [9] proves that such an x can be computed with probability 1 — 5 from SA and 
Sb, where S is drawn from a distribution that simultaneously satisfies both the (l/2,ry~^5) and 
{y^e/r, (5)-JL moment properties for some fixed constant rj > 1, and where rank(^) < r <n. Thus 
due to Remark 1171 we have the following. 
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Theorem 21. There is a one-pass streaming algorithm for linear regression in the turnstile model 
where one maintains a sketch of size 0(n^e~^ log(l/(^) log(nd)). Processing each update requires 
0{n + sjnje ■ log(l/(5)) arithmetic operations and hash function evaluations. 

Theorem 1211 improves the update complexity of [9], which was 0{ne~^log{l/6)). 
5.2 Low rank approximation 

In this problem, we have an A E M*^^" of rank p with entries that require precision 0{log{nd)) to 

dcf 

store. We would like to compute the best rank-r approximation Ar to A. We define = — j4r||F 
as the error of A,.. We relax the problem by only requiring that we compute a matrix A'^. such that 
11^4 — ^^||_F < (1 + e)Ar with probability 1 — 6 over the randomness of the algorithm. 

Two-pass algorithm: Theorem 4.4 of gives a 2-pass algorithm where in the first pass, one 
maintains 5^4 where S is drawn from a distribution that simultaneously satisfies both the (1/2, r]~'^5) 
and (^e/r, 5)-JL moment properties. It is also assumed that p > 2r + 1. The first pass is thus 
sped up again as in Theorem [211 

One-pass algorithm for column/row- wise updates: Theorem 4.5 of [9] gives a one-pass 
algorithm in the case that A is seen either one whole column or row at a time. The algorithm 
maintains both SA and SAA^ where S is drawn from a distribution that simultaneously satisfies 
both the (1/2, ?7~''(5) and (y^e/r, 5)-JL moment properties. This implies the following. 

Theorem 22. There is a one-pass streaming algorithm for approximate low rank approximation 
with row /column- wise updates where one maintains a sketch of size 0{re~^{n-\-d) \og{l/5) \og{nd)). 
Processing each update requires 0{r + \frje ■ log(l/5)) amortized arithmetic operations and hash 
function evaluations per entry of A. 

Theorem 1221 improves the amortized update complexity of [9], which was 0{re^^ log{l/ 5)). 

Three-pass algorithm for row-wise updates: Theorem 4.6 of gives a three-pass algorithm 
using less space in the case that A is seen one row at a time. Again, the first pass simply maintains 
SA where S is drawn from a distribution that satisfies both the {1/2, rj~^ 6) and (y^e/r, 5)-JL 
moment properties. This pass is sped up using our sparser JL distribution. 

One-pass algorithm in the turnstile model, bi-criteria: Theorem 4.7 of [9] gives a one-pass 
algorithm under turnstile updates where SA and RA^ are maintained in the stream. S is drawn 
from a distribution satisfying both the (1/2, ry~'''°g(i/'')/<^(5) and (e/y^r log(l/5), 5)-JL moment prop- 
erties. R is drawn from a distribution satisfying both the {1/2, r]~^S) and {sj e/r, (5)-JL moment 
properties. Theorem 4.7 of [9] then shows how to compute a matrix of rank 0(re~^ log(l/(^)) which 
achieves the desired error guarantee given SA and RA^ . 

One-pass algorithm in the turnstile model: Theorem 4.9 of [9] gives a one-pass algorithm 
under turnstile updates where 5^1 and RA^ are maintained in the stream. S is drawn from a distri- 
bution satisfying both the (1/2, ry"^^°^^"'^/'^^/^ 5) and (ey^e/(r log(l /5)), (5)-JL moment properties. R 
is drawn from a distribution satisfying both the {1/2, ri~^5) and {\J e/r, (5)-JL moment properties. 
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Theorem 4.9 of [9] then shows how to compute a matrix of rank r which achieves the desired error 
guarantee given SA and RA^. 

Remark 23. In the algorithms above, we counted the number of hash function evaluations that 
must be performed. We use our construction in Figure [T]|^c), which uses 2 log(l/5)-wise independent 
hash functions. Standard constructions of t-wise independent hash functions over universes with 
elements fitting in a machine word require 0{t) time to evaluate [7j. In our case, this would blow 
up our update time by factors such as n or r, which could be large. Instead, we use fast multipoint 
evaluation of polynomials. The standard construction [7] of our desired hash functions mapping 
some domain [z] onto itself for z a power of 2 takes a degree- (t — 1) polynomial p with random 
coefficients in F^. The hash function evaluation at some point y is then the evaluation p(y) over 
F^. Theorem 1241 below states that p can be evaluated at t points in total time 0{t). We note that 
in the theorems above, we are always required to evaluate some t-wise independent hash function 
on many more than t points per stream update. Thus, we can group these evaluation points into 
groups of size t then perform fast multipoint evaluation for each group. We borrow this idea from 
[20j . which used it to give a fast algorithm for moment estimation in data streams. 

Theorem 24 ( \29\ Ch. 10]). LetH be a ring, and let q € R[x] be a degree-t polynomial. Then, given 
distinct Xi,...,xt G R, all the values q{xi), . . . , q{xt) can be computed using O(tlog^tloglogt) 
operations over R. 
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Appendix 

A On the sparsity required in various schemes 

In this section we show that sparsity ^}{e~^ log{l/6)) is required in Figure [T][b) and Figure dK^c) , 
even if the hash functions used are completely random. We also show that sparsity 0(e~^ log^(l/5)) 
is required in the DKS construction (Figure [I^a)), nearly matching the upper bounds of [6| [T9]. 
Interestingly, all three of our proofs of (near-)tightness of analyses for these three constructions use 
the same hard input vectors. In particular, if s = o(l/e), then we show that a vector with t = 
[l/(se)J entries each of value l/\/t incurs large distortion with large probability. If s = Q{l/e) but 
is still not sufficiently large, we show that the vector (1 /v^, 1 / V^, 0, . . . , 0) incurs large distortion 
with large probability (in fact, for the DKS scheme one can even take the vector (1,0, . . . ,0)). 
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A.l Near-tightness for DKS Construction 

The main theorem of this section is the following. 

Theorem 25. The DKS construction of 110^ requires sparsity s = VL{e~^ ■ |'log^(l/5)/log^(l/e)] ) 
to achieve distortion lie with success probability 1 — 5. 

Before proving Theorem I25| we recall the DKS construction (Figure [Dj^a)). First, we repli- 
cate each coordinate s times while preserving the £2 norm. That is, we produce the vector 
X = {xi, . . . ,xi, X2, ■ ■ ■ , X2, ■ ■ ■ ,Xd, ■ ■ ■ , Xd)/^/s, where each is replicated s times. Then, pick 
a random k x ds embedding matrix A for k = Ce~^log(l/5) where each column has exactly one 
non-zero entry, in a location defined by some random function h : [ds] — t- [k], and where this non- 
zero entry is ±1, determined by some random function a : [ds] — )• { — 1,1}. The value C > is some 
fixed constant. The final embedding is A applied to x. We are now ready to prove Theorem [25j 
The proof is similar to that of Theorem [28j 

Our proof will use the following standard fact. 

Fact 26 ([211 Proposition B.3]). For all t,n£R with n > 1 and [t[ < n, 

e\l-t^/n) < (1 + t/n)" < e*. 

Proof (of Theorem [25]) . First suppose s < l/(2e). Consider a vector with t = [l/{se)\ non- 
zero coordinates each of value l/\/t. If there is exactly one pair {i,j} that collides under h, 
and furthermore the signs agree under a, the £2 norm squared of our embedded vector will be 
{st — 2)/(st) + 4/(st). Since l/(st) > e, this quantity is at least 1 + 2e. The event of exactly one 
pair {i,j} colliding occurs with probability 

(;0-^na-.A-).o(i^).(i-./2,v. 

= 0(l/log(l/5)), 

which is much larger than 6/2 for 5 smaller than some constant. Now, given a collision, the colliding 
items have the same sign with probability 1/2. 

We next consider the case l/(2e) < s < 4/e. Consider the vector x = (1,0,... ,0). If there 
are exactly three pairs {ii, ji}, . . . , {^3,^3} that collide under h in three distinct target coodinates, 
and furthermore the signs agree under a, the £2 norm squared of our embedded vector will be 
(s — 6)/(s) + 12/(s) > 1 + 3e/2. The event of three pairs colliding occurs with probability 

= n{i/iogHi/6)), 

which is much larger than 6/2 for 6 smaller than some constant. Now, given a collision, the colliding 
items have the same sign with probability 1/8. 

We lastly consider the case 4/e < s < 2ce~^ log^(l/(5)/log^(l/e) for some constant c > 
(depending on C) to be determined later. First note this case only exists when 6 = 0{e). Define 
x = (1, 0, . . . , 0). Suppose there exists an integer q so that 
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1. q^js > 4e 

2. q/s < e 

3. {s/{qk)Y{l - l/kY > (^1/3. 

First we show it is possible to satisfy the above conditions simultaneously for our range of s. 
We set q = 2y/es, satisfying item 1 trivially, and item 2 since s > 4/e. For item 3, Fact [26] gives 



(»/(,w^(i-iAr>(^)'..-/'.(i-i) 



The e"'*/'^ • (1 — (s//c^)) term is at least S^^^ by the settings of s,k, and the {s/{qk))i term is also 
at least 6^^^ for c sufficiently small. 

Now, consider the event £ that exactly q of the s copies of xi are hashed to 1 by h, and to +1 
by a. If £ occurs, then coordinate 1 in the target vector contributes q'^/s > 4e to ^2 iii the target 
vector by item 1 above, whereas these coordinates only contribute q/s < e to ||a;||2 by item 2 above, 
thus causing error at least 3e. Furthermore, the s — q coordinates which do not hash to 1 are being 
hashed to a vector of length k — 1 = a;(l/e^) with random signs, and thus these coordinates have 
their £2 contribution preserved up to 1 ± o{e) with constant probability by Chebyshev's inequality. 
It thus just remains to show that Pr[<?] ^ 5. We have 



s-q 



Prpin;) (l-ij -1/2' 



qkj V 29 



>5V3. J_. 
21 



The 2-1 term is uj{5^/^) and thus overaU Pr[<5] = a;(52/3) ^ ^_ 



A. 2 Tightness of Figure [T](b) analysis 

Theorem 27. For 5 smaller than a constant depending on C for k = Ce~^ log(l/(5), the scheme 
of Section \4.^ requires s = ^(e"^ log(l/5)) to obtain distortion lie with probability 1 — 6. 

Proof. First suppose s < l/(2e). We consider a vector with t = [l/(se)J non-zero coordinates 
each of value 1/Vt- If there is exactly one set i,j, r with i ^ j such that Sr,i, Srj are both non-zero 
for the embedding matrix S (i.e., there is exactly one collision), then the total error is 2/(ts) > 2e. 
It just remains to show that this happens with probability larger than 6. The probability of this 
occurring is 

2 i fc-s k-2s + 2 / {k-2s + l)l \ / {k-sy. y-^ / k-st y^ 

* \2) "k' k-1 ' ' ' k-s + 1 ' \ {k-ts + iy. J ' V kl J - ^ ■ V k J 

- 2k \ k J 
= J](l/log(l/5)). 
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Now consider the case l/{2e) < s < c - e~^log{l/6) for some small constant c. Consider the 
vector {l/y/2, 1/^/2,0, ... ,0). Suppose there are exactly 2se collisions, i.e. 2se distinct values of 
r such that Sr,i,Sj^r are both non-zero (to avoid tedium we disregard floors and ceilings and just 
assume se is an integer). Also, suppose that in each colliding row r we have a{l, r) = a{2, r). Then, 
the total error would be 2e. It just remains to show that this happens with probability larger than 
5. The probability of signs agreeing in exactly 2es chunks is 2~^^* > 2"^'^^°^'^^/'^'', which is larger 
than for c < 1/4. The probability of exactly 2es collisions is 

2es)\\\ k-^) \ n k-^-2es)^y2e) 

/ S \2es f ^ 2s 



It suffices for the right hand side to be at least since h is independent of a, and thus the 
total probability of error larger than 2e would be greater than \f5 = 5. Taking natural logarithms, 
it suffices to have 

/4efc\ , / 2s\ 



2esln J - sin (^1 - — J < \n{l/5)/2. 

Writing s = q/e and a = 4Clog(l/5), the left hand side is 2q\n{a/ q) + Q{s'^ /k). Taking a derivative 
shows 2q\n{a/q) is monotonically increasing for q < a/e. Thus as long as, q < ca for a sufficiently 
small constant c, 2q\n{a/q) < ln(l/J)/4. Also, the Q{s'^/k) term is at most ln(l/(5)/4 for c suffi- 
ciently small. ■ 



A. 3 Tightness of Figure [T](c) analysis 

Theorem 28. For 6 smaller than a constant depending on C for k = C £~'^ \og{l / 5) , the scheme 
of Section\4. 1\ requires s = r2(e~^ log(l/5)) to obtain distortion lie with probability 1 — 5. 



Proof. First suppose s < l/(2e). Consider a vector with t = [l/(se)J non-zero coordinates each 
of value l/y/t. If there is exactly one set r with i ^ j such that h{i,r) = h{j,r) (i.e. exactly 
one collision), then the total error is 2/{ts) > 2e. It just remains to show that this happens with 
probability larger than b. 

The probability of exactly one collision is 



[kjsY 



(i-2)!.('=fci) 



(kjs) 
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sH{t-i) 

2k 

sh{t - 1) 



t(s-l) 
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t-2 



1 



sh^ 



2k 

= ^^(l/log(l/<5)), 

which is larger than 6 for 6 smaller than a universal constant. 

Now consider l/(2e) < s < c - e^^\og{l/6) for some small constant c. Consider the vector 
X = (l/-v/2, l/-v/2, 0, ... ,0). Suppose there are exactly 2se collisions, i.e. 2se distinct values of r 



20 



such that h(l,r) = h{2,r) (to avoid tedium we disregard floors and ceihngs and just assume se is 
an integer). Also, suppose that in each colliding chunk r we have a{l,r) = a{2,r). Then, the total 
error would be 2e. It just remains to show that this happens with probability larger than 6. The 
probability of signs agreeing in exactly 2es chunks is 2~^^* > 2"^^^^°^'-'^/'^^ which is larger than y/6 
for c < 1/4. The probability of exactly 2es collisions is 

The above is at most VS, by the analysis following Eq. (I13p . Since h is independent of a, the 
total probability of having error larger than 2e is greater than y/d =6. ■ 
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